Typical solution time for a vertex-covering algorithm on finite-connectivity random 

graphs 
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In this letter, we analytically describe the typical solution time needed by a backtracking algorithm 
to solve the vertex-cover problem on finite-connectivity random graphs. We find two different 
transitions: The first one is algorithm-dependent and marks the dynamical transition from linear 
to exponential solution times. The second one gives the maximum computational complexity, and 
is found exactly at the threshold where the system undergoes an algorithm-independent phase 
transition in its solvability. Analytical results are corroborated by numerical simulations. 
PACS: (89.20.Ff), (02.10.-c), (05.20.-y), (89.15.Hc) 



Over the last few years, phase-transition phenomena 
in combinatorial problems have increasingly attracted 
computer scientists and, more recently, also statistical 
physicists 0,01 • Many computationally hard problems, 
as e.g. 3-satisfiability, graph coloring, number partition- 
ing or vertex cover H , undergo dramatic changes in their 
solvability or their solution structure when external pa- 
rameters are changed. These problems, all belonging to 
the class of NP-complete problems B , are believed to be 
solvable only in a time which scales exponentially with 
the problem size. Therefore the scientific interest was 
largely increased by the observation that phase transi- 
tions are strongly related to a pronounced peak in the 
typical computational time: The hardest instances were 
typically found in the vicinity of the transition point, 
where problems are said to be critically constrained. Far 
away from this point, problems are easily solved or hope- 
lessly over-constrained. Problems at such phase bound- 
aries thus provide an optimal testing ground for the de- 
velopment or improvement of algorithms. 

Classical complexity theory characterizes the hardness 
of a computational problem with respect to the worst 
possible case. The above-mentioned observations have 
however underlined the need of a typical- case complex- 
ity theory. At this point statistical mechanics enters: 
Many algorithm-independent aspects of these phenom- 
ena, as e.g. the location of the phase transition and the 
solution space structure, have already been characterized 
using methods from statistical mechanics of disordered 
systems, 0-01 . A description of the average behavior of 
specific algorithms is however less obvious. Probabilis- 
tic methods help to analyze simple descent algorithms 
and thus establish rigorous bounds on phase boundaries 
[p|,pl0] , but the calculation of computing times for com- 
plete backtracking algorithms was out of range. Recently 
a breakthrough was obtained in ^ for the 3-satisfiability 
problem: Combining elements of probabilistic analysis 
with methods from statistical mechanics, the typical time 
complexity of a backtracking algorithm could be ob- 
tained. 

The vertex-cover decision problem: In this let- 
ter we concentrate on the vertex-cover (VC) problem on 



finite-connectivity random graphs, which is one of the 
basic NP-complete combinatorial problems, see B. It is 
expected that no algorithm can be designed which solves 
this problem always in a time growing sub-exponentially 
with the graph size. VC was recently shown to have sim- 
ilar phase-transition properties as satisfiability, but it is 
much easier to understand due to its simpler geometri- 
cal structure [fj. After having introduced the model and 
reviewed some recent results, we will introduce a simple 
branch- and-bound algorithm and calculate its computa- 
tional time complexity by means of analytical as well as 
numerical tools. 

Vertex covers are defined as follows: Take any undi- 
rected graph G = {V, E) with N vertices i ^ V = 
{1,2, ...,7V} and M edges {i,j} ^ E ^V -kV. We con- 
sider subsets Vvc C V] vertices i with i e V^c are called 
covered, and uncovered for i ^ V^c- Analogously also an 
edge z, j € E \s called covered iff at least one of its end- 
vertices is covered, i S Kc or j e Kc- The set Kc is a 
vertex cover iff all edges of the graph are covered. 

The vertex-cover decision problem asks whether there 
are VCs of fixed given cardinality xN — \Vvc\- In other 
words we are interested if it is possible to cover all edges 
of G by covering xN suitably chosen vertices, i.e. by 
distributing xN covering marks among the vertices. 

In order to be able to speak of typical or average cases, 
we have to introduce an ensemble of graphs. We inves- 
tigate random graphs G^^c/n with N vertices and edges 
{i, j} which are drawn randomly and independently with 
probability c/iV, thus the average connectivity c remains 
finite in the large- A^ limit. For a complete introduction 
to the field see ^ . 

When the number xN of covering marks is lowered 
(c is kept constant), the model is expected to undergo 
a coverable-uncoverable transition. Using probabilistic 
tools, rigorous lower and upper bounds for this threshold 
1 13 1 and the asymptotic behavior for large connectivities 
1 14 1 have been deduced. Recently we have investigated 
VC using a statistical mechanics approach 0: For con- 
nectivity c < e ~ 2.72 the transition is given by 
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where W{c) is the Lambert-M^-function defined by 
W{c)ex.p{W{c)) = c. For x > Xc{c), vertex covers of 
size xN exist with probabihty one, for x < Xc{c) the 
available covering marks are not sufficient. For connec- 
tivities c > e replica symmetry breaking is present, and 
no exact result for Xc{c) has been obtained. 

The algorithm: We analyze a branch-and-bound al- 
gorithm similar to [jll|, for an introduction to this kind 



of algorithms see ]13] . As each vertex is either covered or 
uncovered, there are (j^) possible configurations which 
can be arranged as the leaves of a binary (backtrack- 
ing) tree. The basic idea is to traverse the whole tree to 
search for vertex covers. At first we explain how the tree- 
traversal is organized (branch) then we show how much 
computational time can be saved by excluding subtrees 
where surely no covers can be found (bound). 

We introduce three states of vertices: free, covered or 
uncovered. The algorithm starts at the root of the tree 
where all vertices are free. The algorithm descends into 
the tree by choosing free vertices at random. Each vertex 
i has two subtrees corresponding to covering/uncovering 
i. Hi has neighboring vertices which are either free or un- 
covered, we mark i covered fust (left subtree). If the num- 
ber of covered vertices does not exceed xN the descent 
continues. If the algorithm returns, vertex i is set uncov- 
ered (right subtree). In case i has only covered neighbors, 
the order the two subtrees is exchanged. The algorithm 
stops either if it has covered all edges before having used 
all covering marks (output: graph coverable) or if its has 
exhausted all covering marks in the right-most branch 
without having covered all edges (output: graph uncov- 
erable) . 

The performance of this algorithm can be improved 
easily by introducing a bound. If at any node one of 
the subtrees can be proven to contain no VC, the corre- 
sponding subtree can be omitted. The bound used here 
is simple: It forbids to mark a vertex uncovered if it 
has any neighbor which was already marked uncovered. 
Otherwise some edges would remain uncovered. The al- 
gorithm is summarized below, where G — (V, E) denotes 
the graph, m{i) e {free,cov,uncov} contains the marks, 
and X equals the currently available number of marks. 
Initially we set m{i) — free for all i ^ V, and X — xN . 

procedure vertex-cover(G,TO, AT) 
begin 

if all edges are covered then 

stop; 
if X = then 

return; 
Select a vertex i with (rn{i) — free) randomly; 
if i has neighbors j with m{j) ^ cov then 
begin 

m{i) <— cov; 

vertex-cover(G', m,X ~ 1); 

if i has no neighbors with m{j)—uncov then 



begin 

m{i) ^- uncov; 
vertex-cover(G, to, X); 

end 
end 

else (all neighbors j of i have m(j) = cov) 
begin 

m{i) ^- uncov; 

vert ex- cover (G, m, X); 

m(i) <— cov; 

vert ex-cover (G, m,X — 1); 
end 



end 



This algorithm is complete, i.e. decides whether or not 
a graph is coverable with the desired number of covering 
marks. Due to backtracking it will in general need ex- 
ponential time in order to decide this question. In the 
following, the solution time is measured as the number of 
visited nodes in the backtracking tree. 

The first descent into the tree: The analysis of the 
first descent into the left-most subtree is straight forward 
for our algorithm, as it forms a Markov process of random 
graphs. In every time step, one vertex and all its incident 
edges are covered and can be regarded as removed from 
the graph. As the order of appearance of the vertices 
is not correlated to its geometrical structure, the graph 
remains a random graph. After T steps, we consequently 
find a graph Gn~t,c/n having N—T vertices. As the edge 
probability remains unchanged, the average connectivity 
decreases from c to (1 — T/N)c. 

For large N, it is reasonable to work with the rescaled 
time t = T/N, which becomes continuous in the thermo- 
dynamic limit. In this notation, our generated graph 
reads G(i-t)N,c/N- An isolated vertex is now found 
with probability (1 - c/A^)(i-*)^-i ~ exp{-(l - t)c}, 
so the expected number of free covering marks becomes 
X(t) = X - N Jldt{l - exp{-(l - t)c}). The first 
descent thus describes a trajectory in the c — x-plane. 



c{t) = {l-t)c 



(2) 



x{t) = 



X — t 

1 -t 
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The results are presented in figure^. There are two cases: 
for large starting value of x, x{t) reaches one at a certain 
rescaled time i < 1, and the graph is proven to be cov- 
erable after having visited tN nodes of the backtracking 
tree. This holds as long as the starting point {x, c) is 
situated above the line 



Xb{c) = 1 + 



- 1 



(3) 



Below Xh{c), x{t) vanishes already before having covered 
all edges. So the algorithm has to backtrack, and, intu- 
itively, exponential solution times have to be expected. 



The backtracking time: In order to calculate the 
solution time also for x < Xb{c), we combine equations 
(|l|) and (||). We have also included xdc) into fig. [|. For 
X < Xb{c), the trajectory of the first descent crosses the 
phase transition line at a certain rescaled time t at (c, x). 
There the generated random subgraph of iV = {1 — i)N 
vertices and average connectivity c becomes uncoverable 
by the remaining xN covering marks. Please note that 
t — for X < Xc{c), i.e. if we already start with an uncov- 
erable graph. To prove this uncoverability the algorithm 
has to completely backtrack the subtree. This part of the 
algorithm obviously contributes the exponentially domi- 
nating part to the solution time. In the following we may 
thus concentrate completely on the generated subgraph, 
skipping "sub-" in subgraph, subtree, subproblem etc. 

Numerical simulation show that the exponential 
solution times approach a log-normal distribution 
of large N. Hence, the typical solution time 

gjvr(a:,c) foUows from thc quenched average t{x, c) — 
limAr^ool/Mog(4t(Gjv_g/jv,i)) where tbt{Gfi^^/fi,x) is 
the backtracking time for the generated uncoverable in- 
stance G fj ~,p^, and the overbar denotes the average over 
the random graph ensemble. Solution times, as already 
mentioned above, are measured as the number of nodes 
visited by an algorithm. Since also the leaves are vis- 
ited nodes, t^t exceeds the number Mi of leaves. As the 
depth of the backtracking tree is at most N, we also 
have tbt < NJ\fi. The exponential time contribution is 
thus given by 
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r(a;,c) = lim — log(A/'j(G^ 
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We have consequently reduced the problem of calculating 
the backtracking time to an entropic calculation which 
can be achieved using the tools of equilibrium statisti- 
cal mechanics. The number of leaves is trivially bounded 
from above by (^jy)' ^-^^ ^V ^^'^ number of possible place- 
ments of the xN covering marks on the N vertices. Using 
Stirlings formula, we thus find 

t(x,c) < — (xlogx + (1 — s)log(l — 5:)) . (5) 

c 

This time is realized by our algorithm if the bound is 
skipped, i.e. if all branches of the backtracking tree are 
visited until the covering marks are exhausted. Using the 
bound, our algorithm does not mark any two neighboring 
vertices simultaneously as uncovered. This excludes the 
most of all (^fj) above-mentioned configurations, leav- 
ing only an exponentially small fraction. So the sim- 
ple bound causes already an exponential speed-up. The 
number of leaves fulfilling our criterion can be charac- 
terized easily: Imagine a certain leaf is reached at level 
kN of the backtracking-subtree. Then, our algorithm 
has constructed a VC of the subgraph consisting of the 
kN visited vertices because edges between these are not 



allowed to stay uncovered. Due to the random order of 
levels in the backtracking tree, this subgraph is again a 
random graph G^^ff -ij^ having average connectivity nc. 
We may thus conclude that the number of leaves at level 
kN equals the total number J\fvc{G^j^ ^/f^, x) of VCs of 



G 



K.N.C/N 



using xN covering marks. Summing over all 



possible values of k leads to the saddle point 
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lim —logJ^vc{G,^,,^,iN) 



N 



(6) 



The average of logTVVc over the random graph ensem- 
ble can be calculated using the replica trick. In or- 
der to avoid technicalities, we use the annealed bound 
log Mvc < logAVc which provides a very good approxi- 
mation. The latter average is calculated easily, we obtain 



t{x,c) ~ -max^^s^...^! 



X _ 
K 



(7) 



Sannix,c) = -£log£ - (1 - £) log(l - x) - -(1 - x)"^ 

where x and c follow from the crossing point of (0) with 
Xc{c). In fig. ^ this result is compared with numerical 
simulations. Due to the exponential time complexity the 
system sizes which can be treated are of course much 
smaller than for the study of the first descent. In or- 
der to eliminate however strong logarithmic finite size 
dependencies, we have also used the number of leaves 
in these simulations; cross-checks using the number of 
visited nodes in the backtracking trees also show the 
expected behavior. Clear consistency of numerical and 
analytical data is found. One also finds that the com- 
putational complexity is maximal at a; = Xc{c) for both 
algorithms with or without bound, as described by equa- 
tions (|) and (0). 

Conclusion and outlook: To conclude, we have 
calculated the typical solution time needed by a com- 
plete backtracking algorithm for vertex covering random 
graphs. We have combined probabilistic methods used 
in computer science for characterizing the first descent, 
and statistical mechanics methods which enabled us to 
calculate the phase transition threshold and the entropy 
of leaves. 

These results imply a very intuitive picture for the dif- 
ferent regimes of the typical computational complexity 
which is expected to share essential features with the be- 
havior of more complicated algorithms. The algorithm 
starts its first descent into the backtracking tree. There 
is some parameter range of the model inside the cover- 
able phase, where the first descent already successfully 
produces a VC and thus proves the coverability of the 
graph with the prescribed number of covering marks. In 
this region the solution time is found to be typically linear 
in problem size. If we lower the allowed number of cov- 
ering marks, the initial problem still remains coverable 



but the first descent into the tree generates an uncover- 
able macroscopic subproblem. To escape from the cor- 
responding subtree, the algorithm has to backtrack and 
consequently consumes exponential time. The maximum 
backtracking tree appears when the initial problem is ex- 
actly situated at the phase transition point x — Xc{c), 
there the exponential solution time shows its maximum, 
as found also for other algorithms m or other combinato- 
rial problems MM . The height of the time peak depends 
however on the considered algorithm, and consequently 
also the maximal analyzable system size. In M, we could 
numerically solve systems up to A^ = 140, but the analy- 
sis of this algorithm goes far beyond the presented meth- 
ods. 

Related to the depicted scenario, there are mainly two 
different possibilities of improving algorithms: 

(i) More sophisticated heuristics for the first descent 
allow to shift the onset of exponential complexity towards 
Xc{c). One important restriction to obtain algorithms 
analyzable within the described scheme is that the order 
of appearance of vertices in the backtracking tree must 
be independent of the structure of the graph {e.g. of 
the connectivities) in order to remain inside the random 
graph ensemble. 

(ii) The second possibility of improving algorithms is 
given by the inclusion of more elaborated bounds into 
the backtracking tree. These result in an exponential 
speed-up of the algorithm, as we have already seen for 
the simple bound used in our algorithm. 
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FIG. 1. Trajectories of the first descent in the {c,x) plane. 
The full lines represent the analytical curves, the symbols nu- 
merical results of one random graph with lO'^ vertices, c = 2.0 
and X — 0.8, 0.7, 0.6, 0.5 and 0.3. The trajectories follow the 
sense of the arrows. The dotted line xi,{c) separates the re- 
gions where this simple algorithm finds a cover from the region 
where the method fails. No trajectory crosses this line. The 
long dashed line represents the true phase boundary Xc{c), 
instances below that line are not coverable. 
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FIG. 2. Normalized and averaged logarithm of running 
time of the algorithm as a function of the fraction x of cov- 
erable vertices. The solid line is the result of the annealed 
calculation. The symbols represent the numerical data for 
TV = 12, 25, 50, lines are guide to the eye only. 



